Augmented Sparse Reconstruction of Protein 
Signaling Networks. 

D. Napoletani a ^ T. Sauer c , D. C. Struppa f E. Petricoin a , L. Liotta a 



Abstract 

The problem of reconstructing and identifying intracellular protein sig- 
naling and biochemical networks is of critical importance in biology today. 
We sought to develop a mathematical approach to this problem using, as 
a test case, one of the most well-studied and clinically important signaling 
networks in biology today, the epidermal growth factor receptor (EGFR) 
driven signaling cascade. More specifically, we suggest a method, aug- 
mented sparse reconstruction, for the identification of links among nodes 
of ordinary differential equation (ODE) networks from a small set of tra- 
jectories with different initial conditions. Our method builds a system 
of representation by using a collection of integrals of all given trajecto- 
ries and by attenuating block of terms in the representation itself. The 
system of representation is then augmented with random vectors, and 
h minimization is used to find sparse representations for the dynamical 
interactions of each node. Augmentation by random vectors is crucial, 
since sparsity alone is not able to handle the large error-in-variables in 
the representation. Augmented sparse reconstruction allows to consider 
potentially very large spaces of models and it is able to detect with high 
accuracy the few relevant links among nodes, even when moderate noise 
is added to the measured trajectories. After showing the performance of 
our method on a model of the EGFR protein network, we sketch briefly 
the potential future therapeutic applications of this approach. 

Keywords: sparse representations, protein interaction models, bio- 
chemical pathways. 

1 Introduction 

The problem of reconstructing a network of interacting variables from a small set 
of data generated by the network itself has attracted considerable attention es- 
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pecially since this problem arises so naturally in genomics, proteomics and more 
generally system biology problems (see for example [Voit 2000 , [Cho u et al. 2006] , 
[Husmeier 2003] . [Rogers et al. 2005] , [Nachman et al. 2004] . [Gardner et al. 2003] ). 
In particular, the ability to reconstruct and identify intracellular protein sig- 
naling and biochemical networks is of critical importance in modern biology. 
However, the ability to dynamically measure and collect enough data from ev- 
ery protein/node within the network is impossible with current methodologies. 
We sought to develop a mathematical approach to this problem using one of 
the most well-studied and clinically important signaling networks in biology 
today, the epidermal growth factor receptor (EGFR) driven signaling cascade 
|Araujo et al. 2005| . 

Interestingly, it is widely believed, and proven in some cases, that biological 
networks are scale free networks with a few variables (hubs) very connected to 
many others and most variables interacting only with a few others [Albert 2005] . 
Even the hubs do not interact with more than a dozen other variables in most 
reliable models, so that effectively we can say that these networks are sparse, 
with respect to the total number of all possible connections among variables. 
Such information can greatly help in reconstructing the network itself, as shown 
in [Gardner et al. 2003] . |Yeung et al. 2002] , |Tegner et al. 2003| . 

Many current algorithms to reconstruct networks from expression data are 
based on the application of powerful Bayesian methods after the seminal work in 
[Friedman et al. 20"00] . but, as noted in [Rogers et al. 2005] (see also [Zak et al. 2002] ). 
these methods do not perform well with the limited amount of data that can 
be generated by microarray technologies. This limitation is especially perti- 
nent for protein expression data. The other widely used approach for network 
reconstruction is based on parameter estimation of dynamical system models 
of the networks themselves [Voit 200"0] , The fundamental difficulty of such ap- 
proach is the very large number of parameters and reaction rates that need to 
be estimated [Chou et al. 2006] . and this, again, leads to an inability to work 
efficiently with the limited data generated by microarrays and time series of 
expression profiles. Another viable alterative when analyzing microarray data 
is to simply perform some type of clustering analysis such as hierarchical or 
K-me&ns clustering [Kaufman et al. 2005], or the recent exemplars clustering 
technique Fre y et al. 2007] . Clustering techniques do not require very large 
data sets to be applied, but they only identify similarly activated variables, and 
do not provide a causal understanding of the network structure. 

To address the need for specialized network reconstruction methods that can 
work for the limited data generated by experiments, we restrict our attention 
in this work to ordinary differential equation (ODE) models of protein signaling 
networks of the form x — f(x), where x is the vector of variables in the system 
and x its componentwise derivative. While assuming the plausibility for biolog- 
ical networks of a dynamical system model is a well established approach in the 
literature ( [Voit 2000] . [Chou et al. 2006] ). we believe that exact modeling and 
parameter estimation for such models is not the most efficient way to find how 
quantities interact in biological systems, partly because parameter estimation is 
so difficult in noisy environments and with small data sets to work with. In addi- 
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tion, biological systems adapt their very network structure over time, especially 
in the presence of diseases. It is likely more effective to search for equivalent, 
indistinguishable, classes of models [Judd et al. 2004] that project to the same 
network structure, in the sense that they give rise to trajectories that are quali- 
tatively similar and they have similar overall topology of the connections among 
nodes 

With this general approach in mind, we ask in this paper the following ques- 
tion: whether the structure of sparse ODE networks can be inferred from a 
small set of trajectories with different initial conditions generated by the sys- 
tem itself. We show that, for a specific realistic case of ODE modeling of protein 
networks, it is possible to expand and adapt ideas from the theory of sparse re- 
gression (lasso) and signal reconstruction by l\ minimization ( Tibshirani 1996 , 
[Chen et al. 1998] . [Hastie et al. 200 lj chapter 3, [Donoho 2006a] 1. to develop 
a method that reconstructs a significant portion of these networks with good 
accuracy even in the presence of moderate noise of intensity up to 20% of the 
maximum values of the trajectories. Our method builds a system of representa- 
tion by using a collection of integrals of all given trajectories and by attenuating 
block of terms in the representation itself. The system of representation is then 
augmented with random vectors, and ?j minimization is used to find sparse 
representations for the dynamical interactions of each node. Augmentation by 
random vectors is crucial in the context of network reconstruction, since spar- 
sity alone is not able to handle the large error-in- variables in the representation 
when trajectories are very noisy. 

One of the main strengths of our method is the ability to sharply distinguish 
relevant links, so that the rate of false links that are detected can be made very 
low. This is important in practice since it is difficult and expensive to follow up 
and validate experimentally potential links among proteins that are inferred by 
computational means [Hu et al. 2006] . 

The paper |Yeung et al. 2002] is a significant antecedent to our work, since in 
that paper the authors use a hybrid singular value decomposition (SVD) and l\ 
minimization to find a sparse linear model that fits oligonucleotide microarray 
data. The l\ minimization is used in that paper as a postprocessing of the 
reverse-engineering performed by the SVD. A similar preconditioning for large 
models is implemented in the recent paper Debashis et al. in press and applied 
to several examples including microarray data. In this paper we will show 
how h minimization methods can be modified to directly approach network 
reconstruction and model identification problems, without any preprocessing, 
for realistic, very limited sampling of the data and significant noise levels, even 
when large spaces of non-linear models are considered. Moreover the results for 
the EGFR network show that our method can recover the topology of relatively 
small protein networks. We do not require explicit estimation of the noise level 
in the trajectories and we do not need multiple trajectories with same initial 
conditions to estimate the true trajectories in the presence of noise. 

In section 2 we show how to apply l\ minimization methods to recon- 
struct sparse ODE networks, stressing the specific steps that are necessary 
in the network setting. In section 3 we apply the algorithm sketched in sec- 
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tion 2 to one particular protein network, the EGFR model as described in 
|Araujo et al. 2005 . 

We do not explore the biological significance of this network, strongly related 
to cell proliferation, but we mention here that it plays a significant role in cancer 
development [Lacouture 2006] , so it is considered an ideal target for fine tuned 
potential therapies that do not impact the body at the systemic level. We 
will briefly mention some possible directions of research related to the medical 
applications of network mapping at the end of the paper. 

2 Methods 

2.1 Sparse Signal Processing 

Suppose we have a discrete function F(n), n = 1,...,N and a collection of 
functions Q — {gi(n), ■■■,gM{ n )i n = 1, N} with M >> N. Then in general 
the representation of F in terms of Q will not be unique, meaning that there will 
be many ways to write F as F(n) = ^ m=1 a m g m (n), n = 1, ...,N. An important 
question when trying to extract the significant features of F with respect to Q is 
to find, among the many possible representation of form as in equation (1), the 
one that is the most 'sparse', that is the representation that has as many zero 
coefficients a m as possible. This problem is in general very difficult, but we can 
use linear programming techniques to find approximate sparse representations, 
that is, representations that have just a few large coefficients and many very 
small ones. 

We briefly introduce this type of approximation to sparse solutions here 
following mostly [Mallat 1998], section 9.5.1 and we refer to [Tibshirani 1996] . 
jChen et al.1998] . [Hastie et al. 2001] , [Donoho 200"6a] and [Donoho 2006b] for 
a thorough analysis of the relations between l\ optimization and sparsity. The 
key idea is to realize that if we minimize the 1-norm of the coefficients \a\ = 
J2m=i l a m| 5 this implies that the total energy of the coefficients is concentrated 
in just a few of them. We can gain an intuition on this by noting that a 
minimization of the 1-norm reduces cancellations among different elements of 
G, since these cancellations increase the 1-norm. 

Note that the problem minQ^ m=1 \a m \), subject to F{n) — J2 m =i a m9m{n), 
n = 1,...,N, is equivalent to the problem nrinQ^p =1 x p ), subject to F(n) — 

Epii x m g m (n) - Y? p =m+i x mg m (n), with x p > for every p = 1, 2M and 
x p — Xp+M = o-p- The linear optimization problem defined by the last two 
equations can be easily put in the standard format of linear programming prob- 
lems, so that a solution can be quickly obtained using one of several algorithms 
|Lustig et al. 1994] . 

Given therefore a discrete signal of length N and a collection of M signals Q 
with M » N we can easily find approximate sparse representations for F in Q. 
This result, first exposed in [Chen et al. 1998] . was the inspiration of a series of 
works that showed the great potential of l\ minimization in signal processing, 
see for example the recent work in jCandcs et al. 2004] . [Candes et al. 2006] . 
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Regression using l\ optimization has also been used extensively and to great 
effect in statistical learning for model identification under the name of lasso, 
after the pioneering work in [Tlbshiraiii 1996 , see [Hastie et al. 2001] chapter 
3 for a up to date review of use of the technique. 

We will see in the next subsection the crucial adjustments that are required 
to make l\ optimization effective and robust in network reconstruction problems. 

2.2 Augmented Sparse Networks 

Let us now write explicitly the general form of the dynamical systems of in- 
terest. Signaling networks arising from protein interactions, even though of- 
ten nonlinear, are often modeled with differential equations that contain sim- 
ple analytical forms that contain power function terms of variables x±, ...,Xn 
of the type x a = x" x x% 2 ■■■x" n , and hyperbolic terms of the type c °^ x _ , tha t 
take into consideration the presence of slow enzymatic kinetics (see [Voit 2000 , 
|Araujo et al. 2005] and references therein). The assumption of sparsity of links 
among the nodes implies that many of the at are actually zero. In this paper 
we assume for simplicity that the right hand side of the dynamical system that 
we try to model has polynomial terms up to degree d = 2, and hyperbolic terms 
c ^ x . , C > 0, more specifically we sample C at uniform intervals of length c in 
a range [0, Sc] of interest with S some large positive integer. If we denote by 
the time derivative of Xi, we consider models of the form: 

N N N N S 

i—l i— 1 j—1 i—1 s—1 

where n — 1,...,N. We can in principle consider a model that contains on 
the right had side all monomials x™* , all binomials x™' x" 3 , all the way to 
X X 2 ■ ■ ■ X " , where the exponents have norm |o^| less than a constant A and we 
assume a uniform sampling of the exponents. This would be a general setting 
compatible with the modeling approach taken in Voit 2000 , however, the main 
complication of such general models is already severe in our quadratic model 
with hyperbolic terms: when we have many nodes in the network, the combina- 
torial explosion of terms makes parameter fitting very difficult in the case only 
a limited amount of data is available on the dynamics of each node. 

A possible way to approach the fitting problem implicit in equation (1) is to 
find the model that minimize the l\ norm of the parameters of the terms in the 
equation. From the background material summarized in the previous subsection, 
we know that l\ optimization leads to a sparse representation of signals with 
very few terms with non-zero parameters, and that the optimization itself can 
be performed with linear programming techniques [Chen et al. 1998] . 

Since we noted in the introduction that actual biological networks are gen- 
erally sparse, the l\ fitting method should, in principle, improve our ability to 
find the actual links among nodes. Exact parameter fitting is difficult in this 
case as well and we will see in the results section that direct application of the 
l\ fitting as used in signal processing leads to very poor results. 
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To be specific, we assume that we sample variables x\, ...xn and that we 
have several trajectories x n>r r = 1, R with R different initial conditions. We 
denote by x\, xn the respective derivatives at each of the sampled points. If 
we write X n = [x n< i, x n .ii], X n = [x n< i, x n< n], and we denote by J the unit 
vector of same length as Xi, a formal substitution in equation (1) of x n with 
X n and x n with X n leads in effect to a problem of representation of discrete 
signals X n in terms of the collection of signals X = {J,Xi, XjX^, S ^ X - } w i* n 
i,j, k = 1, N, s = 1, ...,S. This way of stating the problem of reconstructing 
a specific network of the form (1) highlights the potential of applying the l\ 
sparsity techniques to recover the effective system from a collection of different 
trajectories. Now the first requirement for applying the l\ method is to have an 
underdetermined system with the cardinality M of X such that M >> L, where 
we denote by L the length of vectors in X. However a direct application of l\ 
optimization to the network data will not work in the presence of high noise and 
for very limited data. As much as sparsity is a powerful device to explore signal 
representations, it is not able by itself to deal with the large error-in- variables in 
the representation generated by the system trajectories. There are some crucial 
modifications that are necessary to get useful reconstruction results on protein 
networks, we term them model augmentation, attenuation of blocks of terms, 
and integral modeling. 

Model Augmentation with Random Terms: The main issue that pre- 
vents an accurate reconstruction of the network is the presence of noise in the 
trajectories. Because of such noise the representation system needs to account 
for large errors-in-variables (Voss et al. 2004] when fitting the models on the 
noisy data. To gain a better sense of this problem, denote the noisy mea- 
surements of Xi and Xj as Xi = Xi + Ni and Xj = Xj + Nj respectively, 
and assume that the differential model includes a term XiXj in the repre- 
sentation of some X n . This means that when we represent X n in X we do 
not just want a sparse representation, but we would like the term XiXj to 
appear in that specific representation with large non zero coefficient. Now 
XiXj = XiXj + XiNj + XjNi + NiNj, and we would like the noisy residue 
XiNj + XjNi + NiNj to 'disappear', i.e. to contribute marginally to the l\ opti- 
mization. The way we approached this problem is to go from the representation 
in equation (1) to a representation: 

N N N 

X n = a Q + J2^ x * + J2Y. hi i X * X J+ ( 2 ) 

i=l i=l j=l 

N S Y G 

^ ^ sc + Xi ^ 9 

i—l s—1 9=1 

where n g g = 1, .., G are discrete random vectors normally distributed, scaled to 
have norm 1 . We want G much larger that L so that the energy of noisy residues 
like XiNj + XjNi + NiNj is uniformly distributed among all the random vectors 
n g , and the overall contribution to the l\ norm of these noisy residues is small. 
Moreover a large value of G improves the conditioning of the corresponding lin- 
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ear programming problem and therefore the speed of convergence to the optimal 
solution. Note that G is dependent on the particular instance of problem that is 
given, and more specifically on the type and number of trajectories and sample 
points in each trajectory but the performance of the method we describe in this 
section is not strongly dependent on its specific value, as long as G » L. In 
Figure 5 we show numerical evidence of the significance of adding random terms 
in the context of the epidermal growth factor receptor case study. 

This extension of the basic model has far reaching consequences, since it 
assures that the new models are large enough to be able to perform an approx- 
imate sparse minimization, strongly retaining the dependence from the original 
terms of the 'effective', non-random model, while diffusing any potential noise in 
the data among the random terms of equation (2). The non-random portion of 
the matrix derived from the ODE network itself can be very ill conditioned. In 
particular, the hyperbolic terms generated by the same variable will be highly 
correlated among each other. There is an intrinsic inability to fully control the 
representation matrix generated by the trajectories and the error in variables 
that are bound to appear when trajectories are very noisy. This is a distinct 
characteristic of ODE reconstruction networks and one that makes this work 
diverge in methodology and outlook from standard l\ signal reconstruction. 

Attenuation of Block of terms: We seek to have reconstructed models 
with low complexity, that is, with terms of low degree, so it is useful to enforce a 
way to explicitly suppress the terms belonging to more complex blocks of terms 
such as quadratic and hyperbolic ones. The large number of quadratic and 
hyperbolic terms increases the chance, in a noisy setting, that several wrong 
terms from these blocks are selected in the representation of each node. By 
suppressing each of these blocks of terms we reduce the chance of this wrong 
selection and we give more weight to linear terms. Such suppression of higher 
complexity terms can be done by using suitable attenuation coefficients. More 
specifically, we choose to attenuate uniformly all terms in a block by a factor 
< [3 < 1. Assuming that all vectors of the collection of terms X were scaled 
to have li norm equal to 1, we effectively multiply their inner product with 
any signal by 4, which is bigger than 1, so the l\ optimization will have the 
tendency to select fewer of them to chose the representation with the minimal 
l\ norm. This is another interesting point specific to the modeling of networks. 
Empirically, we find that this adjustment is important for obtaining the very 
best results in the reconstruction of the geometric structure of the network, for 
example an attenuation of a factor of /3 = 0.5 for both quadratic and hyperbolic 
terms was near optimal for the epidermal growth factor receptor network. The 
need of some attenuation is especially strong when we want very few selected 
false links and the trajectories are very noisy. We find that a wide range of 
small values of f3 gives similar reconstruction results, but the optimal selection 
of (3 for each different block, including a possible attenuation for the block of 
random terms, is an open problem and we will explore numerically this issue 
in a separate paper. Essentially, these attenuation coefficients are one more 
device to keep the errors- in- variables from generating false links in the computed 
representation of each node, assuming that low degree and low complexity terms 
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are to be preferred. 

Integral Modeling: In a realistic reconstruction setting we have few sam- 
ple points and a relative noise that can be as high as 20% in the measured 
trajectories, making the estimation of the derivatives very difficult. This prob- 
lem transcends the specifics of our approach and is a key issue in the study 
of experimentally generated time series. Note that to use l\ optimization, we 
clearly do not need to use only local differential information. To avoid the prob- 
lem of direct estimation of derivatives in the highly noisy cases, we note that 
the equations as (1) can be written for n = 1, N, in integral form as: 

x n (t) - x n (t ) = dp + aj / Xidt+ (3) 

„■ — i '-'to 



i=l Jt « 
N N t N S t 

+y^y^ba / xiXjdt +y^y^ / — - — dt. 



This integral representation avoids the implicit problem of finding a good esti- 
mation of the derivative from a limited number of samples of the trajectories. 
The relative noise in the measurement of x n {t) — x n (t ) is comparable with the 
relative noise of the time series x n itself when t is far from to- Moreover, for 
biological signals derived from proteomics and genomics, variables often repre- 
sent intensity or concentration profiles that always assume positive values and 
therefore, in these cases, we expect the integrals on the right hand side to be 
dominated by the integrals of the true values of the variables, when zero mean 
noise is added to them. A further advantage of integral modeling is that we can 
easily estimate multiples of the integrals on the right hand side of equation (3) 
by summing up the samples that are given from to to t, if sampling is uniform. 
If sampling is not uniform, which is very often the case for experimental data, 
we can scale the contribution of each summand multiplying by the size of the 
corresponding sampling interval. 

Note that the constant term do was used simply as a term to correct po- 
tential biases in (1), as it does not carry information on the nodes' links, so we 
use it similarly in (3) and we do not take its integral. The augmentation by 
random terms and the attenuation of blocks of terms clearly can be applied to 
the integral representation (3) as well. The system representation that takes 
into account random augmentation, attenuation of blocks of terms and integral 
modeling is the following: 



x n (t) - x n (t ) = ao + ^di Xidt+ (4) 



N N t N S t G 

/ x iXj dt + (3 h J2Yl / -T^ dt + H n a- 

where (3 q and fih are positive attenuation coefficients for quadratic and hyper- 
bolic terms, both smaller than 1. This representation is used in the actual 
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network reconstruction algorithm that we are going to describe now. 
2.3 Network Reconstruction Algorithm 

The observations in the previous subsection can be gathered into a simple recon- 
struction algorithm based on h optimization, which we call augmented sparse 
reconstruction. We label the variables involved in a slightly different way in the 
algorithm to highlight the flexibility in the choice of the input for the algorithm. 
Given trajectories from a sparse system that is believed to be of a certain generic 
form, for each discretely sampled trajectory X nyT , r = 1, let X n _ r be the 

vector X n ^ r (t) — X n _ r (t ) where t takes all sampled values. Moreover, for a given 
vector g(t), t — i , i i; let 1(g) be the vector whose Z-th component is the sum 
X^i=o an d let J denote the unit vector. The basic process to identify the 

nodes is the following: 

A Suppose we are given N node variables and that for each variable it is pos- 
sible to generate R uniformly sampled trajectories X n ^ r r = 1, R with 
different initial conditions. Write Y n = [X n ,i, ■■■,X n ,R], G n = \I(X nt \), 
I(X n>R )], n = l,...,N, Gn = [I(X tA X jA ),...,I(X hR X hR )] and H js = 

[ 7 ( ^c+x,-,i )'-' J ( ac+Xj, a )]' s = M, and 5 is the sampling interval 
for the hyperbolic terms. For each n = 1, ...,N: 

B Choose an attenuation coefficient f3 q for the quadratic terms and another 
one, fih, for the hyperbolic terms. Let n g , g = 1, .., G, be discrete random 
vectors normally distributed scaled to have norm 1. Denote by | | the 
2-norm of a vector and let Gi be the matrix whose columns are all the 

C G 

vectors , G q be the matrix whose columns are all possible vectors j^r 2 -^ 

and H be the matrix whose columns are all allowed hyperbolic terms 
pj^y. Let Nq be the matrix whose columns are the random vectors n g 
scaled to have norm 1. Choose G large enough to have the matrix Z = 
[J,Gi, P q G q , PhH , N G ] with small condition number (say less that 10 2 ). 

C Find the minimal l\ solution to the underdetermined system Y n = Za. 

D Choose a threshold T n and let ar„ be the coefficients in a larger than T n . 
Let T n , the estimated set of directed links of node n, be the union of all 
node indexes that appear in terms of Z corresponding to coefficients in 

a T n ■ 

Basically in step A we use the sampled trajectories to estimate the integrals 
in the representation on the right hand side of equation (4). In step B we 
scale the terms of the representation, we attenuate quadratic and hyperbolic 
terms, and we augment the model with scaled random terms. In step C we 
apply l\ minimization. Finally in step D we select the largest coefficients in 
the representation and we estimate the set of links that determine the dynamics 
of each node. The choice of the threshold in step D is very delicate and it is 
explored in depth for the epidermal growth factor receptor signaling network 
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that we study in the next section. We stress again that by no means we need to 
limit ourselves to linear, quadratic and hyperbolic terms, general power function 
expansions or higher degree polynomial terms are possible within the frame of 
this method, since random terms and l\ minimization keep the reconstruction 
stable even for very underdetermined systems of representation. 

3 Results 

3.1 The epidermal growth factor receptor (EGFR) Net- 
work 

In this section we show the performance of the augmented sparse reconstruction 
method A-D on the epidermal growth factor receptor (EGFR) protein network 
described in |Araujo et al. 2005] and explicitly shown in the appendix. We again 
emphasize that the ability to dynamically measure and collect enough data from 
every protein/node within the network is impossible with current experimental 
methodologies. The EGFR network is one of the most well-studied and clinically 
important signaling networks in biology today and the ability of our method to 
reconstruct a model of such fundamental network is very promising. The EGFR 
network has been modelled in |Araujo et al. 2005] as a system involving only 
linear, quadratic and hyperbolic terms, so the general model in equation (1) 
(and therefore in equation (4)) is ideally suited for its analysis, however the 
right hand side of equations (1) and (4) have a very large number of quadratic 
and hyperbolic terms for the EGFR network, both due to the large number of 
variables involved, and to the need of considering a sufficiently large sampling of 
hyperbolic terms. Therefore already in this case we are faced with the extreme 
difficulty of finding the few relevant terms for the actual EGFR network. Despite 
this difficulty, augmented sparse reconstruction is able to find a very significant 
fraction of the links in the network. Moreover preliminary results show that 
the method is robust with respect to changes in the size and type of system of 
representation. 

The sparsity of links for the EGFR system has some variation between nodes; 
we have 11 variables with less than 4 distinct terms (linear, quadratic or hy- 
perbolic) in the expression for their derivative, 9 variables with less than 8 
terms and 1 variable, £4, with 19 terms. This last variable is not sparse and 
corresponds to the main 'hub' of the EGFR network. 

We assume that 100 time series with different initial condition, each of a 
length of 25 points, are available for each variable in the system. Only 500 uni- 
formly selected points among the total 2500 are used in the algorithm, so that 
we are effectively working with a very small data set of points, even though we 
gain some information from the missing points when estimating the integrals in 
equation (4). The initial conditions for each variable are chosen as uniformly 
distributed random numbers in the interval [0, 40]. In real systems the bio- 
logically significant ranges of initial conditions vary among different variables. 
This raises an interesting theoretical and practical question: which is the mini- 
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mal domain of initial conditions that allows the reconstruction of the network? 
This question is particulary relevant for networks that display simple dynamics, 
since each short trajectory may not carry the full information on the underlying 
network. 

The length of the time series is chosen to be consistent with the sampling rate 
that can be performed in practice, that is the reason we take only 25 uniformly 
spaced points in the time interval [0, 27] along each trajectory. To put this 
number in perspective, we note that for a time series with fast initial decay like 
the one in Figure 1(a) this mean that we have only 2 to 3 points in the high 
varying region of the series, while for a time series with slow decay as the one in 
Figure 1 (b) we have a larger proportion of points where the time series has not 
yet relaxed to its steady state. As we already stressed, this infrequent sampling 
is one of the reasons we had to move from the differential representation of the 
network to the integral one used in A-D, as it may be problematic to estimate 
derivatives in such infrequent sampling scenario. The way we add noise to the 
trajectories is by taking the maximum M of each given time series and by adding 
uniform white noise in the interval [—to, to] where to is equal to a fraction of 
M; this seems to give levels of noise consistent with experimental conditions. 
The characteristic shape of the noisy time series from the EGFR network, is 
shown in Figure 1 (noise level 15%). 

The sampling interval of the hyperbolic terms is c = 10 and the total number 
of hyperbolic terms for each variable is S — 50. The total number of terms for 
the model, and therefore the total number of parameters, is 1449, far more than 
the 500 data points we use to find the links for each node. The number of 
random vectors to be used in step B is chosen as G = 2500. The attenuation 
for the quadratic and the hyperbolic terms is chosen to be (3 q = (3h = 0.5. 

In Figure 2 we show a typical example of the sparse representation that can 
be obtained by applying A-D to the infrequently sampled, noisy trajectories 
of the EGFR network with the noise level as in Figure 1. More specifically, 
we show the reconstructed representation for Y 2l the vector of all integrals of 
x 2 defined in step A, with respect to the integral of all linear, quadratic, and 
hyperbolic terms, as defined in step B. We choose variable x 2 because it has very 
few terms in its actual representation of the derivative, namely x 2 = — 0.06x 2 + 
0.2x3 + 0.003a;iX23 — 0.02x2, having a sparsity for which the algorithm works 
often at its best. We plot the norm of the coefficients of: the linear terms in 
Figure 2(a), from G\ to G23; the quadratic terms in Figure 2(b), ordered as 
Gi,i,..., Gi,23, 62,2,-, G22,23! the hyperbolic terms in Figure 2(c), ordered as 
Hi t i,..., -ff^io,..., iJ23,iv, -^23, 10; and the random terms in Figure 2(d). The 
3 largest coefficients across all terms correspond exactly to three of the terms 
in the representation of x 2l namely x 2l X3 and X!X 2 3, we are missing instead 
the x| term. The forth largest coefficient in the reconstructed representation 
corresponds to the x 2 3 term, so it repeats to some extent the information on 
the network linkage given by the X1X23 term. If we apply the reconstruction 
algorithm to this node with 20 different realizations of 15% relative noise, the 
x 2 term appears as dominant 18 times, the X3 term 15 times, the X1X23 term 20 
times and the x 2 term 3 times. Note that the dominance of a term is not only 
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Figure 1: In subplots (a) and (b) we show typical trajectories that are observed 
in the EGFR system, sampled uniformly 25 times in the time interval [0, 27]. 
Starred curves are the actual trajectories, circled curves are the trajectories with 
15% relative noise added. Plot (a) shows a trajectory of xg that settles within 
few samples points to a base value, plot (b) a trajectory of xe with a much 
slower decay. 
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Figure 2: From top left, we plot the norm of the coefficients of Y 2 (as defined 
in step A) for: (a) the linear terms Gf, (b) the quadratic terms G q ; (c) the 
hyperbolic terms H; and (d) the random terms. 
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due to the size of its coefficient, but to the intensity of the corresponding signal 
as well, for example X1X23 has coefficient 0.003 in the equation for x% and yet 
it is recovered more often than x 2 that has larger coefficient 0.02. We believe 
this has to do with the specific norm scaling that is selected in step B of the 
algorithm. Different choices of norm scaling may be useful to improve further 
the performance of the method. 

The example of the reconstruction of the representation of £2 is typical: some 
terms not only may be missing, but they can be partially wrong, for example 
a term Xi may appear in the representation as x 2 , or a term XiXj may be 
replaced by a term XiXk that gives similar shapes for the given initial conditions. 
Note that these two possibilities do carry some significant information on the 
geometry of the network, even though the specific terms are incorrect. 

We can compare at this point the effectiveness of our l\ method in identifying 
the relevant links with respect to simpler techniques such as correlation. In 
Figure 3 we show, for comparison, the correlation Y2 with: the linear terms in 
Figure 3(a), from G± to G23', the quadratic terms in Figure 3(b), ordered as 
Gi,!,..., Gi : 23, ^2,2,—! £22,23; the hyperbolic terms in Figure 3(c), ordered as 
-Hi,!,..., ifi^io,..., #23, 1,---, #23,io- The most negatively correlated linear term 
corresponds to x^ , the most negatively correlated quadratic term to x\ , and the 
cluster of most negatively correlated hyperbolic terms correspond to X2 as well. 
Note however that many terms show similar level of large negative correlation, 
especially among the quadratic terms, so that it is difficult to set a threshold 
on the norm of the correlation coefficients that would, for example, identify 
only X1X23 as another relevant term. The key point is that the considerable 
sparsity of the reconstructions computed by our method allows for an accurate 
distinction of false links and true links. We do not explore this issue further in 
this paper, but see our forthcoming paper for a comparison of the augmented 
sparse reconstruction with I2 regression on a special class of ODE networks. 

To evaluate globally the quality of the reconstruction results for different 
levels of noise we use the ratio of computed true links with respect to the to- 
tal number of true links (true positives rate) and the ratio of computed false 
links with respect to the total number of false links (false positives rate). An 
important question when assessing the quality of reconstruction is the proper 
estimation of the thresholds T n used in D. In general we expect these thresholds 
to vary according to the noise level in the time series, but even the sampling 
rate will affect our degree of confidence in the computed links so we must find 
an automatic way to estimate the threshold from the data. Note moreover that 
the threshold must be represented in terms of the coefficients used to repre- 
sent each node. To this extent we define a threshold, for a given system, as a 
constant multiple of the standard deviation of the non-zero coefficients of the 
non-random terms of each node, what we may call the deterministic coefficients 
of the representation (in practice we neglect any coefficient with norm smaller 
than 10~ 10 ). Formally we define Ti = Kai, i = 1, N, where K is some fixed 

1 D. Napoletani, T. Sauer, Reconstructing the Topology of Sparsely-Connected Dynamical 
Systems, submitted. 
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Figure 3: From top left, we plot the correlation coefficients of Y 2 (as denned 
in step A) with: (a) the linear terms Gi; (b) the quadratic terms G q ; (c) the 
hyperbolic terms H. 
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Figure 4: In plot (a) we have the average true positives rates for relative noise 
in the trajectories from 0% to 25% when the value of the threshold multiplier 
K is artificially set to keep the false positive rate at: 0.1 (starred curve) and 
0.05 (circled curve). In plot (b) we have the average true positive rates (starred 
curve) and average false positive rates (squared curve) for relative noise in the 
trajectories from 0% to 25% where the value of the threshold multiplier K is 
found for each noise level by using the heuristic E. 

constant determined for the whole system, while Ui is the standard deviation of 
the absolute value of the deterministic non-zero coefficients of the representation 
of node i. This flexible definition of the threshold ensures that: a) the threshold 
level is relative to the norm of the coefficients of each node; b) the threshold is 
larger if there are many sizeable non-zero coefficients in the representation of a 
specific node. The main advantage of a uniform definition of threshold across 
all variables is that we need the proper estimation of a single threshold K, and 
we have the whole reconstruction data available to do that. If the network has 
very distinct behavior for different subsets of nodes, it may not be possible to 
use a single multiplier and we must resort to thresholds estimated for each node 
separately. 

Before suggesting a specific way to find the uniform threshold K from the 
computed representations, let us see what we would get with an 'ideal' choice of 
it. Suppose that for each noise level in the time series, we select K so that the 
false positives rate stays below 0.1. We perform such analysis for 20 realizations 
for each level of relative noise in the trajectories from 0% to 25%. In Figure 
4(a) we can see the result of such choice of thresholds: the average true positives 
rate (starred curve) is high (around 0.65) even for realistic trajectories' noise 
of the order of 20%. For all 20 realizations the computed true positive rates 
differed from the average by at most 0.054 units, with standard deviation, at 
each noise level, of at most 0.026. The computed average values of K are: 
K 0% = 0.032, K 5% = 0.046, K w% = 0.067, K 15% = 0.099, K 20% = 0.120, 
^25% = 0.137. The noisier the time series, the higher the value of K needed 
to keep the false positives rate small. Figure 4(a) shows also the average true 
positives rates (circled curve) when the false positives rate is kept at 0.05. In 
this case, for all 20 realizations the computed true positive rates differed from 
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the average by at most 0.071 units, with standard deviation, at each noise 
level, of at most 0.03. The computed average values of K are: K % = 0.104, 
K 5% = 0.129, K w% = 0.185, K 1B% = 0.253, K 20% = 0.315, K 25% = 0.369. 
Even for noiseless trajectories we still do not find all true links, this is due in 
part to the infrequent sampling of the trajectories that hides subtle interactions 
among the nodes. Note that the rates we display are obtained excluding from the 
average the reconstruction of variable 4 that does not have sparse representation 
and which is a 'hub', so likely to be better knwon experimentally. We decided 
to exclude that specific protein in our estimates of the errors because only very 
few proteins are believed to perform the role of hub of a network in signaling 
pathways, therefore we believe that the error rates computed above are a better 
indicator of the errors we would find in computing the representation of a generic 
protein in a large network. Moreover the representation of is so different from 
the others that the use of the same threshold multiplier for it as for all other 
variables does not seem appropriate. Including variable 4 in computing the 
errors, without any modification in the choice of threshold multiplier, would 
lead true positive rates to slightly worsen. 

It is possible in principle to have cases when much finer sampling of the 
trajectories is experimentally available. To simulate this scenario, suppose we 
sample the trajectories of the EGFR network uniformly 100 times in the time 
interval [0, 27], then the error rates improve significantly. If we keep false posi- 
tive rate to 0.1, then average true positive rates are about 0.88 in the absence of 
noise, this is a improvement of almost 0.08 with respect to the same time series 
sampled uniformly only 25 times. In Figure 5(a) we show average true positive 
rates for noise level from 0% to 25% in this fine sampling scenario. We use 20 
realizations for each noise level in computing the averages 

Figure 5(b) shows that a large number of random terms is important for the 
proper functioning of algorithm A-D. Namely we show that the average true 
positive rates improve when we increase the number G of random terms from 
to 3000 in intervals of 250. Trajectories are sampled infrequently (25 times in 
the interval [0, 27]), false positive rates are kept at 0.1 and relative noise level 
is fixed at 15%. The true positives rate is very low, just 0.13, when there are no 
random terms added. Addition of 1000 or more terms gives high true positive 
rates, about 0.68, that are comparable for several values of the number of terms 
G. These results show also the robustness of the algorithm with respect to the 
choice of G. 

3.2 Choice of Threshold 

We now approach the problem of finding a suitable value of K from the recon- 
struction data generated by the algorithm itself at the end of step C. Denote 
by S(K) the total number of selected links that are found in step D of the 
augmented sparse reconstruction algorithm by using thresholds T n = Ka n . We 
can split S(K) as S(K) = S t (K) + S f (K) where S t (K) denotes the number of 
true computed links and Sf(K) the number of false computed links. Since for 
each node we have only a small number of true links by assumption, and their 
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Figure 5: In plot (a) we have the true positives rates (starred curve) for relative 
noise in the trajectories from 0% to 25% when the value of the threshold mul- 
tiplier K is artificially set to keep the false positive rate (squared curve) at 0.1. 
Network trajectories are sampled 100 times in the interval [0, 27] to generate 
this plot. In plot (b) we have the true positive rates (starred curve) as a function 
of the number of random terms added to the model. Network trajectories used 
to generate this plot are sampled 25 times in the interval [0, 27] and noise level 
is kept at 15%. The value of the threshold multiplier K is artificially set to keep 
the false positive rate at 0.1. 



corresponding coefficients in the representation are, in general, very large, we 
can conjecture that, as we let K increase continuously from to oo, S t (K) will 
decrease very slowly at the beginning. Since St{K) assumes only integer val- 
ues, this slow decay will appear as infrequent small jumps, this means that the 
(discontinuous) derivative dS(K) of S(K) will be dominated by the derivative 
dSf(K) of Sf(K) for small values of K and by dSt(K), the derivative of St(K), 
for larger values of K, therefore we can infer some of the properties of S/(K) 
which is not known, from those of S(K), which is a computable function. 

In Figure 6(a) to 6(c) we show approximations to dS t (K), dSf(K) and 
dS(K) for a specific reconstruction with relative noise in the time series of 
the order of 10%. We choose a fine uniform sampling U = 0.001 of K, up to 
K = 3.5, so that dS(K) never goes below —2. To have the ideal case in which 
dS(K) > —1 for all K seems to require excessively fine sampling rate. Note that 
S{K) is identically zero for K > 3.26, we can also immediately see the similarity 
of dS(K) and dSf(K) in the frequency of negative jumps for small values of K. 
The frequency of jumps of dS(K) greatly decreases around K = 0.30, to see this 
transition point more clearly, let K\,..., Km be the values of K, ordered from 
smallest to largest, for which dS(K) ^ and define a function J(i) = K{ — Ki-i, 
i = 2, M that computes the width of negative jumps. We plot J in Figure 6(d) 
and we can see that for i « 57 we suddenly have much wider intervals between 
jumps, this value of i corresponds to K57 « 0.29. We argue that a suitable 
value Kf of the multiplier K is the one for which J(i) has very different local 
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Figure 6: Plots of: (a) dS t {K); (b) dS f (K); (c) dS(K) for 10% relative noise in 
the trajectories. In plot (d) we show J(i), the distance between the i — 1-th and 
the i-th negative jumps in dS(K), for 10% relative noise in the trajectories. K 
is sampled uniformly with sampling interval of size U = 0.001. The arrow in 
plot (d) points to the index values, around i = 57, for which we have a large 
change of mean frequency of jumps. 

averages for i < f and i > f. To find such Kj we can use the following rule: 

E Set an integer I, let V(i) = -j-j-y, i — I, ...,M — /, where we denote by 

Ji-i the mean of J for values between i — I and i and by Ji+i the mean 
of J for values between i and i + 1. Denote by / the index for which V(i) 
is minimum, and by Kf the corresponding threshold multiplier. 

Rule E uses the ratio of the local mean of the length of intervals between jumps 
before and after the i-th jump to select the jump for which the relative increase 
is maximum. If we use I = 20 in E, we find from the output data of step C of 
the algorithm the following threshold multipliers: K % = 0.114, K 5 % = 0.164, 
K w% = 0.328, K 15% = 0.415, K 2Q% = 0.428, K 25% = 0.486. In Figure 4(b) we 
plot the average true positive rates and average false positive rates computed 
by using in step D of the algorithm by using these estimated values of K. 
We use 20 realizations for each noise level in computing the averages, for all 
20 realizations the computed true positive rates differ from the averages by at 
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most 0.11 units, with standard deviation, at each noise level, of at most 0.058. 
The false positives rate is below 0.11 for all realizations and all noise levels. 
With 10% relative noise in the trajectories, we have a significant average true 
positives rate of about 0.61 with average false positives rate of only 0.025. There 
is a greater variability in true positives and false positives rates in Figure4(b) 
with respect to those computed in Figure4(a), this fact points out the need of 
a more sophisticated analysis of the change of mean frequency. For example, 
a wavelet maxima analysis ([Mallat 1998J, chapter 6) of J(i) can be used for a 
more robust evaluation of the thresholds. It would be of great interest to deduce 
theoretically the value of Kf, under suitable conditions on the class of network 
models. 

4 Discussion 

There are still many open questions whose answers will shape the way aug- 
mented sparse reconstruction methods are applied to network reconstruction. 
What is the limiting node sparsity that still allows the network itself to be re- 
covered with this method? how are the error rates affected if only a subset of 
the variables is available? how does the network we compute on this subset of 
variables relates to the full network? if we have a node that is unrelated to 
the chosen subset of the network, it tends to have a greater portion of its norm 
accounted for by the random terms, and this ca be used to decide whether it 
is well connected to the other variables, but further research in this direction is 
needed. 

In some cases the 'skeleton' of the network may be available, for exam- 
ple for proteins networks we may know roughly how the system is connected 
for healthy cells. Can we use this additional information to detect, with this 
method, whether patients with cancer develop additional strong links among 
nodes? Preliminary evidence suggests that small new links that do not make 
the system unstable are often detectable, but it would be interesting to use the 
available information on the skeleton of the network directly in the algorithm. 

Even when previous information on the network is not available, it is very 
important for clinical applications to determine whether a specific protein has 
very distinct representations for cancerous cells and for healthy ones. If this is 
the case, then the reconstruction algorithm can predict changes in the signaling 
pathways that are likely due to the cancer itself. 

One possible extension of our method is to perform the estimation of links 
only on local subsets of trajectories. This local application of the algorithm 
may highlight different links that could be dominant for different sets of initial 
conditions, in H we show that this strategy is indeed feasible. Note that the 
augmented sparse reconstruction scheme has an edge over simple li regression 
especially when there is a very limited set of initial conditions. Even in the case 
in which it is possible to span the entire phase space of the network, there is a 

2 D. Napoletani, T. Sauer, Reconstructing the Topology of Sparsely-Connected Dynamical 
Systems, submitted. 
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limit to the density of the initial conditions that can be taken and therefore a 
local application of this method will be beneficial (because there are only a few 
local trajectories). By putting together the information on links that arise in 
different regions of the phase space it may be possible to find very tenuous links 
that would otherwise be undetectable in a global analysis. A clear advantage of a 
local version of the method is its generality, since simple, low degree polynomial 
models can always be used. 

The augmented sparse reconstruction described in this paper is able to iden- 
tify relevant links among nodes in very large systems of representation and with 
very noisy conditions, so we expect the algorithm to scale well to the use of cu- 
bic or higher degree terms in the representation and even to the use of general 
power functions, possibly with non integer and negative exponents. The use of 
attenuation of blocks of terms in the representation will turn out to be even 
more important when the size of the dictionary of terms is increased. 

We stated in the introduction that a promising approach to biological net- 
works is to deemphasize exact modeling, in favor of a robust identification of 
classes of suitable models. If we take one step forward in this direction, then 
the techniques of network control, and the very notion of global stability of a 
network, must be changed in such a way that they are valid for entire classes 
of indistinguishable systems [Judd et al. 2004] that produce trajectories that 
arc qualitatively similar. In this perspective, The reconstruction algorithm de- 
scribed in this paper could be used as an intermediate step of data-driven control 
schemes based on particle filter techniques, by providing an indistinguishable 
model that locally behaves as the real one. This potential application of the 
augmented sparse reconstruction method would be an interesting step in the 
direction of real time, personalized therapies that require an online estimation 
and control of specific pathways in the cell networks of individual patients, 
[Liotta et al. 2001] . |Araujo et al. 2005 . 



20 



Appendix: the EGFR network 



ii = 0.06x2 — 0.003x 1 x 2 3 
x 2 = -0.06x2 + 0.2a: 3 + 0.003xix 23 - 0.02x1 
x 3 = -l.lx 3 + 0.01x 4 + O.Olx^ 
x 4 = x 3 - O.OI.X4 + 0.2x 5 + 0.3x 6 + 0.05x7 
+ 0.03x8 + 0.6x 9 + 0.3xio + 0.3xii + 0.12xi2 

- 0.0045x 4 xi3 - 0.0009x 4 xi4 - 0.0009x 4 xi 5 

- 0.06x4Xig — O.OO6X4X17 — 0.003x 4 xig — O.O9X4X20 

450x 4 

- 0.00 24x4X22 - — - 

50 + X4 

x 5 = — 1.2x 5 + 0.05x 6 + 0.06x 4 xi6 

X6 = X5 — 0.35x6 + O.OO6X4X17 

x 7 = -0.05x7 + 0.06x 8 - 0.01x7x21 + 0.003x4X19 

x 8 = -0.09x 8 + 0.0045x 4 xi3 + 0.01x 7 x 2 i 

Xg = —6.6X9 + O.O6X10 + 0.09X4X20 

xio = 6x9 — 0.07xio + 0.0009x 4 xi4 

xn = -0.4xn + 0.0214xi2 + 0.0009x 4 xi 5 - 0.01xnx 2 i 

+ 0.003xi O xi9 

X12 = -0. 1843xi2 + 0.00 24x 4 x 22 + 0.009xi Xi 3 + 0.01xnx 2 i 
xi 3 = 0.03x 8 + 0.0429xi2 - 0.0015xi 3 + 0.1x 22 

- 0.0045x 4 xi3 - 0.009xi xi3 - 0.021xi 3 xi 4 
+ O.OOOIX19X21 

X14 = 0.3xio + 0.1xi5 + O.IX22 — 0.0009x 4 xi4 

- 0.021x13x14 - 0.003x M xi9 - j n ?Xli 

340 + x w 

X15 = 0.3xn - 0.1xi5 + 0.064x22 - 0.00 09x 4 xi 5 
+ 0.003xi4Xi9 + 0.03xi 5 x 2 i 

X17 

xi 6 = 0.2x 5 - 0.06x 4 xi 6 + — — 

100 + X17 

X17 



xu — x 6 + xn + xi 8 + x 4 xi7 + 

1 + xn 

%i8 = xn — 0.03xi8 

xig = 0.05x7 + O.lxn + 0.0015xi3 + 0.1xi 5 

— 0.003x4Xi9 — 0.003xioxig — 0.003xi4Xig 

- O.OOOIX19X21 

x 20 = O.6X9 - 0.09x 4 x 20 + , 1 n 7Xl4 

340 + X14 

x 2 i = 0.06x8 + 0.0214xi2 + 0.0015xi3 + 0.064x 22 

— O.OIX7X21 — O.OIX11X21 — 0.03xi5X 2 i 

21 

- O.OOOIX19X21 

x 22 = 0.12xi 2 - 0.064x22 - 0.00 24x 4 x 22 + 0.021xi 3 xi 4 
+ 0.03xi 5 x 2 i 

X23 = 
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